Exploratory Data Analysis With Categorical Variables: An Improved Rank-by-Feature Framework and a Case Study

نویسندگان

  • Jinwook Seo
  • Heather Gordish-Dressman
چکیده

Multidimensional datasets often include categorical information. When most dimensions have categorical information, clustering the dataset as a whole can reveal interesting patterns in the dataset. However, the categorical information is often more useful as a way to partition the dataset: gene expression data for healthy vs. diseased samples or stock performance for common, preferred, or convertible shares. We present novel ways to utilize categorical information in exploratory data analysis by enhancing the rank-by-feature framework. First, we present ranking criteria for categorical variables and ways to improve the score overview. Second, we present a novel way to utilize the categorical information together with clustering algorithms. Users can partition the dataset according to categorical information vertically or horizontally, and the clustering result for each partition can serve as new categorical information. We report the results of a longitudinal case study with a biomedical research team, including insights gained and potential future work. Color figures are available at www.cs.umd.edu/hcil/ben60

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards constructing an Integrative, Multi-Level Model for Cognition: The Function of Semantic Networks

Integrated approaches try to connect different constructs in different theories and reinterpret them using a common conceptual framework. In this research, using the concept of processing levels, an integrated, three-level model of the cognitive systems has been proposed and evaluated. Processing levels are divided into three categories of Feature-Oriented, Semantic and Conceptual Level based o...

متن کامل

Feature Selection in Big Data by Using the enhancement of Mahalanobis–Taguchi System; Case Study, Identifiying Bad Credit clients of a Private Bank of Islamic Republic of Iran

The Mahalanobis-Taguchi System (MTS) is a relatively new collection of methods proposed for diagnosis and forecasting using multivariate data. It consists of two main parts: Part 1, the selection of useful variables in order to reduce the complexity of multi-dimensional systems and part 2, diagnosis and prediction, which are used to predict the abnormal group according to the remaining us...

متن کامل

Designing a faculty members professional care framework: a case study at Chamran University

Professional care of university faculty members plays an important role in the development of human resources of the Ministry of Science, Research and Technology. Present study was implemented to design a university's faculty members professional care framework. This research was an applied type, exploratory combined (qualitative- quantitative) by nature in which data gathering at qualitative p...

متن کامل

به‌کارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر هم‌خطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان

Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...

متن کامل

Determination constructs validity of an agile organization model by using factor analysis

During 21st century, manufacturing success and survival are becoming more difficult to ensure this fact is originated in the emergency of new business era that has "change" as one of its major characteristics. Change in business environment and uncertainly have entered management study and research for the last two decades. Agility enhances the organization ability to provide high quality produ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. J. Hum. Comput. Interaction

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2007